CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
نویسندگان
چکیده
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as K -means, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit some static models. These algorithms can breakdown if the choice of parameters in the static model is incorrect with respect to the data set being clustered, or if the model is not adequate to capture the characteristics of clusters. Furthermore, most of these algorithms breakdown when the data consists of clusters that are of diverse shapes, densities, and sizes. In this paper, we present a novel hierarchical clustering algorithm called CHAMELEON that measures the similarity of two clusters based on a dynamic model. In the clustering process, two clusters are merged only if the inter-connectivity and closeness (proximity) between two clusters are high relative to the internal inter-connectivity of the clusters and closeness of items within the clusters. The merging process using the dynamic model presented in this paper facilitates discovery of natural and homogeneous clusters. The methodology of dynamic modeling of clusters used in CHAMELEON is applicable to all types of data as long as a similarity matrix can be constructed. We demonstrate the effectiveness of CHAMELEON in a number of data sets that contain points in 2D space, and contain clusters of different shapes, densities, sizes, noise, and artifacts. Experimental results on these data sets show that CHAMELEON can discover natural clusters that many existing state-of-the art clustering algorithms fail to find.
منابع مشابه
The New Software Package for Dynamic Hierarchical Clustering for Circles Types of Shapes
In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. ...
متن کاملA Modified Multilevel Approach to the Dynamic Hierarchical Clustering for Complex types of Shapes
In data mining, efforts have focused on finding methods for efficient and effective cluster analysis in large databases. Active themes of research focus on the scalability of clustering methods, the effectiveness of methods for clustering complex shapes and types of data, high-dimensional clustering techniques, and methods for clustering mixed numerical and categorical data in large databases. ...
متن کاملParallel Algorithm for the Chameleon Clustering Algorithm using Dynamic Modeling
With the increasing size of data-sets in application areas like bio-medical, hospitals, information systems, scientific data processing and predictions, finance analytics, communications, retail and marketing, it is becoming increasingly important to execute data mining tasks in parallel. At the same time, technological advancements have made shared memoryparallel computation machines commonly ...
متن کاملروش نوین خوشهبندی ترکیبی با استفاده از سیستم ایمنی مصنوعی و سلسله مراتبی
Artificial immune system (AIS) is one of the most meta-heuristic algorithms to solve complex problems. With a large number of data, creating a rapid decision and stable results are the most challenging tasks due to the rapid variation in real world. Clustering technique is a possible solution for overcoming these problems. The goal of clustering analysis is to group similar objects. AIS algor...
متن کاملChameleon: Hierarchical Clustering Using Dynamic Modeling
68 Computer C lustering is a discovery process in data mining. 1 It groups a set of data in a way that maximizes the similarity within clusters and minimizes the similarity between two different clusters. 1,2 These discovered clusters can help explain the characteristics of the underlying data distribution and serve as the foundation for other data mining and analysis techniques. Clustering is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999